Improvement of Tone Intelligibility for Average-Voice-Based Thai Speech Synthesis

نویسنده

  • Suphattharachai Chomphan
چکیده

Problem statement: Tone intelligibility in speech synthesis is an important attribute that should be taken into account. The tone correctness of the synthetic speech is degraded considerably in the average-voice-based HMM-based Thai speech synthesis. The tying mechanism in the decision tree based context clustering without appropriate criterion causes unexpected tone neutralization. Incorporation of the phrase intonation to the context clustering process in the training stage was proposed early. However, the tone correctness is not satisfied. Approach: This study proposes a number of tonal features including tone-geometrical features and phrase intonation features to be exploited in the context clustering process of HMM training stage. Results: In the experiments, subjective evaluations of both average voice and adapted voice in terms of the intelligibility of tone are conducted. Effects on decision trees of the extracted features are also evaluated. By considering gender in training speech, two core experiments were conducted. The first experiment shows that the proposed tonal features can improve the tone intelligibility for female speech model above that of male speech model, while the second experiment shows that the proposed tonal features improve the tone intelligibility for gender dependent model than for gender independent model. Conclusion: All of the experimental results confirm that the tone correctness of the synthesized speech from the averagevoice-based HMM-based Thai speech synthesis is significantly improved when using most of the extracted features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards the Development of Speaker-Dependent and Speaker-Independent Hidden Markov Model-Based Thai Speech Synthesis

Problem statement: Tone distortion in Thai languages can deteriorate not only the intelligibility of speech but also its naturalness. Therefore, the correctness of tone must be carefully taken into account in continuous speech synthesis. The preliminary work confronted this problem when applying HMM-based speech synthesis to Thai. Approach: This study presented a study on speaker-dependent and ...

متن کامل

Tone Question of Tree Based Context Clustering for Hidden Markov Model Based Thai Speech Synthesis

Problem statement: In HMM-based Thai speech synthesis, tone is an important issue that brings about the intelligibility of the synthesized speech. Tone distortion resulted from imbalance of the training data should be appropriately treated. Approach: This study described an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch and state duration are modeled simulta...

متن کامل

Study on Pitch Contour of Thai Polysyllabic Tone Sequences Using a Generative Model

Thai speech synthesis by rule has been developed. In order to synthesize F0 contours of Thai tones, the generative model of F0 contours (Fujisaki's model) for tonal languages is applied. Along with our method, the pitch contours of Thai polysyllabic words were analyzed. Rules are derived and applied to synthesize Thai polysyllabic tone sequences. We performed listening tests to evaluate intelli...

متن کامل

Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model

As classic and intrinsic requirements, synthetic speech need to convey correct information with good quality of naturalness to listeners. Fundamental frequency (F0) contours need to be controlled to meet these requirements. Additional challenges have been introduced to tonal languages because the F0 contour reflects both intelligibility and naturalness of the speech. According to the fact that ...

متن کامل

Improvement of prosodic characteristic in Vietnamese speech synthesis system base on HMM

The key factors helping people to understand the synthesized voices of text-to-speech system are the naturalness and the intelligibility. However, making more natural voices remains a difficult task because of the speech data’s scarcity. With data limited corpus, prosodic information such as tone, intonation, Part-of-Speech is added to ensure the quality of synthetic speech. In the paper, we in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012